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Deep Sequencing Insights in Therapeutic shRNA 
Processing and siRNA Target Cleavage Precision 

Hubert Denise\ Sterghios A. Moschos^ Benjamin Sidders^ Frances Burden", Hannah Perkins", Nil<l<i Carter", Tim Stroud", 
IVIichael Kennedy", Sally-Ann Fancy^ Cris Lapthorn^ Helen Lavender", Ross Kinloch", David Suhy^ and Romu Corbau" 

TT-034 (PF-05095808) is a recombinant adeno-associated virus serotype 8 (AAV8) agent expressing three short hairpin RNA 
(shRNA) pro-drugs that target the hepatitis C virus (HCV) RNA genome. The cytosolic enzyme Dicer cleaves each shRNA 
into multiple, potentially active small interfering RNA (siRNA) drugs. Using next-generation sequencing (NGS) to identify and 
characterize active shRNAs maturation products, we observed that each TT-034-encoded shRNA could be processed into as 
many as 95 separate siRNA strands. Few of these appeared active as determined by Sanger 5' RNA Ligase-Mediated Rapid 
Amplification of cDNA Ends (5-RACE) and through synthetic shRNA and siRNA analogue studies. Moreover, NGS scrutiny 
applied on 5-RACE products (RACE-seq) suggested that synthetic siRNAs could direct cleavage in not one, but up to five 
separate positions on targeted RNA, in a sequence-dependent manner. These data support an on-target mechanism of action 
for TT-034 without cytotoxicity and question the accepted precision of substrate processing by the key RNA interference (RNAi) 
enzymes Dicer and siRNA-induced silencing complex (siRISC). 
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Introduction 

Chronic hepatitis caused by hepatitis C virus (HCV), affects 
170 million people worldwide, progresses to liver cirrhosis in 
about 1 in 4 cases and is associated with increased risk of 
hepatocellular carcinomaJ '^ The current standard of combina- 
tion therapy-based care is not effective against all genotypes, 
exhibits variable cure rates and is associated with significant 
side effects. Moreover, even the latest management strate- 
gies require chronic dosing and have been already shown to 
be prone to viral escape. 

It has been suggested that direct targeting of the HCV 
genome could provide a valuable alternative.^ High genomic 
variability is encountered in HCV due to error-prone replication. 
Because the HCV genome is propagated exclusively as RNA, 
several direct targeting approaches have been proposed,"'^ 
including RNA interference (RNAi)." A high efficiency alterna- 
tive to synthetic RNAi delivery systems,^ however, involves 
the use of recombinant viruses possessing capsid-specific 
tropism for different organs.**"^" In these systems, siRNA are 
delivered in the form of short hairpin RNA (shRNA) precur- 
sors expressed from virus-encoded DNA plasmids. 

An example of such an approach is TT-034 (also known as 
PF-05095808), a recombinant adeno-associated virus (AAV) 
vector with hepatic tropism conferred by its serotype 8 capsid 
(AAV8)."'^^ The virus has been modified to encode three shR- 
NAs against HCV (Figure la), two of which were designed in 
a passenger strand-loop-guide strand orientation (Figure lb). 
We previously reported dose-dependent activity of these shR- 
NAs, delivered individually or together as TT-034, against a 
HCV genotype 1 replicon model in vitro?^ In those experiments. 



we did not observe activation of innate immune responses nor 
detect replicon inhibition following transduction with empty cap- 
sid vector confirming that the activity of TT-034 was specifically 
due to delivery of the shRNAs. 

Short hairpin RNAs are transcribed in the nucleus and are 
exported into the cytoplasm via Exportin 5 to be matured into 
RNAi-active siRNA.^^ In agreement with reports on microRNAs 
(miRNAs)," shRNA maturation involves multiple cleavages of 
the loop structure in the hairpin by the cytoplasmic RNAse III 
enzyme Dicer.^^"" Dicer is thought to cleave shRNA at 21-25 
base pair intervals from the stem-end of the shRNA, result- 
ing in loop removal and generation of double-strand siRNAs.^" 
One of the strands, known as the guide or active strand, is 
then loaded into one of four Argonaute (AGO) proteins forming 
the siRNA-induced silencing complex (siRISC; reviewed in ref. 
"). Thermodynamic stability design of synthetic siRNA duplex 
ends can direct siRISC loading. Lack of such design flexibil- 
ity, however, is valuable to TT-034 as HCV replication involves 
a negative polarity RNA intermediate. Thus, both strands of the 
three shRNAs encoded in TT-034 could be active against HCV. 

After Ago is loaded, siRISC uses the guide strand to iden- 
tify RNA harboring complementary sequences and, if AG02 
is involved, direct endonucleolytic cleavage of the target phos- 
phodiester backbone specifically opposite nucleotides 10-11 
from the 5' end of the guide strand.^^ The result is believed to 
be a single, novel 5' end on the RNA target, typically confirmed 
by the Sanger sequencing-based method 5' RNA Ligase-Medi- 
ated Rapid Amplification of cDNA Ends (5' RACE). Importantly, 
data thereto are the only means of obtaining definitive proof 
of mechanism for candidate siRNA and shRNA therapeutics. 
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In this article, we explore the utility of next-generation 
sequencing (NGS) in scrutinizing shRNA maturation and 
siRNA cleavage, as a means of fully characterizing the 
potentially active drug products and specific mode of action 
of TT-034. Our investigations reveal that shRNA can yield an 
unexpectedly large diversity of putatively RNAi-active strands 
and an increased number of cleavage sites on targeted RNA. 
Moreover, we report that siRNA can direct target cleavage in 
more than one positions on the targeted RNA, in a sequence- 
dependent manner. 

Results 

TT-034-encoded shRNA are processed into an unexpect- 
edly diverse panel of putative sIRNA strand products 

In order to characterize the diversity of siRNAs produced 
by the three anti-HCV shRNAs expressed from TT-034, we 



treated Conib HCV replicon cells with TT-034. This resulted in 
replicon inhibition as measured by reporter activity (Table 1). 
RNA was therefore extracted and subjected to pair-end small 
RNA NGS (sRNA-seq).The sum of sRNA-seq reads aligning 
to the TT-034-encoded shRNAs represented an average of 
0.063% of all sequenced RNA with 1712(0.01 6%) molecules 
originating from shRNA6, 418 (0.004%) from shRNA19 and 
3718 (0.042%) from shRNA22 (Supplementary Table S2). 
The relative frequency of these putative siRNA strands indi- 
cated uneven hairpin expression, corroborating previous 
data assessing production of guide strands against the (+) 
RNA strand of the HCV genome as determined by qPCR.^^ 

Differences in putative siRNA strand length were 
also observed (Figure 2a-c): the geometrical means of 
shRNA-aligning sequence lengths were similar between 
shRNA6 and shRNA19 (19.9 and 19.5 nt respectively). 
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Figure 1 Genome organization and target site location of TT-034, a recombinant AAV8 vector encoding three shRNA targeting the HCV 
genome, (a) The viral genome of AAV8 has been replaced with 3 expression cassettes separated by Pol III termination sequences and stuffer 
regions. The contents of the expression cassettes are from 5' to 3': modified U6-9 Pol III promoter (box A) driving shRNA22 expression targeted 
to the HCV NS5B region (viral RNA-dependent RNA polymerase), modified U6-1 Pol III promoter (box B) driving shRNA19 expression, also 
targeted to the NS5B region and modified U6-8 Pol III promoter (box C) driving shRNA6 targeted to the 5' UTR of HCV. The expression 
cassettes are flanl<ed by inverted tandem repeats derived from AAV4 at the 5' end and AAV2 at the 3' end of the genome, (b) Schematic 
representation of the expressed hairpin polarity. The guide strand is defined as the strand targeting the (-i-) RNA HCV genome. 



Table 1 RNAI agents used to prepare samples for 5' RACE and RACE-Seq 








Luclferase 


Treatment 


Sequence 


Concentration 


Inhibition (%) 


TT-034 


shRNA6, shRNAig and shRNA22 


1 0" vg/cells 


69.3±9.2 


s-shRNAe" 


5'-CGCGAAAGGCCUUGUGGUACUgaagcuugAGUACCACAAGGCCUUUCGCuuuuu-3' 


500 pmol/l 


77.2±12.2 


s-shRNAig" 


5'-GUCAACUCCUGGCUAGGCAAuuuguguagUUGCCUAGCCAGGAGUUGACuuuuuu-3' 


800 pmo!/l 


61.4±5.3 


s-shRNA22» 


5'-AUUGGAGUGAGUUUAAGCUgaagcuugAGCUUAAACUCACUCCAAUuuuuu-3' 


500 pmo!/l 


92.7±1.4 


sIRNAe" 


5'-gAGUACCACAAGGCCUUUCGC-3'; 5'-GAAAGGCCUUGUGGUACUgaa-3' 


40 nmol/l 


88±6.2 


sIRNAig"'"^ 


5'-uagUUGCCUAGCCAGGAGUUGAC-3'; 5'-CAACUCCUGGCUAGGCAAuuugu-3' 


10 nmol/l 


89.6±2.6 


SIRNA222 


5'-AGCUUAAACUCACUCCAAUuu-3';5'-AUUGGAGUGAGUUUAAGCUga-3'; 


500 pmo!/l 


93.4±2.3 



"Nucleotides constituting the shRNA loop sequence are indicated in lowercase. "The loop-derived nucleotide in bold is not mismatched to the target. "[Sense]; 
[Antisense] 
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but significantly larger for shRNA22 (23.5 nt). Similarly, the 
range of putative siRNA strand lengths was comparable 
between shRNA6 and shRNA19 (15-32 nt and 15-30 nt 
respectively), with shRNA22 yielding sequences between 
15-37 nt in length (Supplementary Table S2). This range 
was found to be on account of an unexpectedly large diver- 
sity of sequences matured from each hairpin: an average 



of 95, 67, and 89 different sequences corresponding to 
shRNA6, shRNA19, and shRNA22 were observed respec- 
tively (Supplementary Table S2). While some of these 
sequences were detected only once, the most predomi- 
nant product was detected up to 677, 145, and 1 ,075 times 
for each of the shRNAs. Thus, while a single sequence 
represented up to 35% of all putative siRNA strands, the 
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Figure 2 Length and sequence diversity of siRNA strands processed out of the three HCV-targeting shRNA expressed from TT-034. Following 
in vitro transduction of Coni b HCV replicon cells with TT-034, small RNA NGS was performed on RNA extracts to determine the diversity of 
siRNA guide and passenger strands processed out of TT-034-encoded hairpins, (a-c) The distribution of guide (black circle) and passenger 
(grey X) strand lengths is indicated for each of the three shRNAs encoded in TT-034. (d-f) The cumulative incidence (y axis, logarithmic scale) 
of distinct 5' ends (black bars) and 3' ends (white bars) processed out of the three shRNAs relative to the TT-034-encoded hairpin sequence 
(x axis) is also shown as an indicator of sequence diversity. 
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remainder were noteworthy for the their breadth of diversity: 
5' end variability ranged between 10 and 25 distinct posi- 
tions, whereas sequences starting from the same 5' end 
could terminate in up to 22 different 3' ends, as observed 
for shRNA22 (Figure 2d-f and Supplementary Table 
S2). Moreover, while shRNA6 and shRNA19 were both 
processed to yield sequences from both hairpin strands 
(Supplementary Table S2), the shRNA22 hairpin was pre- 
dominantly processed in favor of the guide strand (99.89% 
of shRNA22-aligning reads). IVIechanistically, however, this 
shRNA product diversity meant that a 2.65%, 24.66%, and 
6.19% fraction of the putative siRNA strands of shRNA6, 
shRNA19, and shRNA22, respectively, contained nucleo- 
tides from the TT-034 vector backbone or from the shRNA- 
loop and therefore bore mismatches in their seed sequence 
to their target in the HCV genome. 

Dicer is the l<ey cytosolic enzyme responsible for both 
shRNA and miRNA maturation. To eliminate the possibil- 
ity that the plethora of shRNA products encountered in our 
studies were the consequence of deficient DICER process- 
ing in this cell line, we queried our dataset for evidence of 
aberrant endogenous mlRNA maturation. In stark contrast to 
our observations on shRNA processing and in accordance 
with previously published NGS data on mature miRNA diver- 
sity archived on mlRbase^^ and elsewhere, an average of 
2.9 variants were observed among the 147 separate mature 
miRNA detected in these cells, with mature miRNA diversity 
essentially limited to the 3' end (Supplementary Table S3). 

TT-034-encoded shRNAs induce an increased number 
of specific cleavages on the HCV replicon RNA genome 
consistent with an RNAi mechanism of action 

To understand the impact of this diversity of putative siRNA 
strands, we engaged in studies on the effect of the TT- 
034-encoded shRNA on the HCV replicon genome. A cur- 
sory examination of the sRNA-seq dataset identified a limited 
number of reads fully aligning to the HCV replicon, some of 
which could be a consequence of RNAi (Supplementary 
Table S4). This was an interesting result given the technol- 
ogy limitations in detecting sequences >80 nt, as direct action 
of TT-034 putative siRNA strands should yield HCV replicon 
genome fragments >400 nt. 

As the anticipated mode of action of TT-034 was RNAi, we 
developed 5' RACE assays to examine if the three shRNA 
products could direct the generation of novel 5' ends in the 
expected locations of the HCV replicon RNA genome. Elec- 
trophoretic analysis of the 5' RACE products before and 
after nested PCR re-amplification was in agreement with this 
hypothesis, since all assays yielded bands of the expected 
size and no amplicons could be detected in the absence 
of TT-034 (Figure 3). Unexpectedly, reactions designed to 
detect shRNA6 activity also yielded high molecular weight 
bands. The size of these bands raised the possibility they 
might reflect 5' RACE products from full-length HCV replicon 
RNA, given the target site of shRNA6 is proximal to the 5' end 
of the HCV genome (Figure 1 a). Sanger sequencing of these 
bands was in agreement with this proposition, indicating that 
the 5' RACE RNA adapter had indeed ligated some 271 
nucleotides upstream of the shRNA6 target site. On the other 
hand, Sanger sequencing of the 5' RACE bands produced 
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Figure 3 Evidence of on-target mode of action for TT-034 by 5' 
RACE. Electrophoretic analysis of 5' RACE products and nested 
PCR re-amplification of 5' RACE products generated on the 
HCV replicon RNA genome on account of the action of TT-034, 
transfected synthetic siRNA analogues or transfected synthetic 
shRNA analogues (s-shRNA) of shRNA6 (target 6), shRNA19 
(target 19), and shRNA22 (target 22). The control lanes correspond 
to 5' RACE amplification using primers specific for target site 6 in 
the presence of synthetic shRNAIQ (lane C1), or in the absence of 
TT-034 using primers for the target site of shRNA19 (lane C2). White 
arrows identify 5' RACE products generated by shRNA6 versus full- 
length HCV replicon RNA (large molecular weight bands). 

from target cleavage at the expected site of action yielded 
an interesting discrepancy; the first 44 nt at the 5' end of the 
sequencing results were consistently ambiguous irrespective 
of the target site under examination (Supplementary Tables 
S5 and S6). Given the 5' RACE adapter was also 44 nt long, 
we resolved this ambiguous region into superposed indi- 
vidual sequences composed of replicon fragments ligated to 
the 5' RACE adapter. This analysis revealed that two different 
novel 5' ends were generated on the HCV replicon -i-RNA 
genome in each site targeted by the three shRNA encoded 
in TT-034 (Table 2). 

siRNA-directed RNAi has been reported to direct cleav- 
age of complementary RNA targets opposite positions 10-1 1 
from the 5' end of the siRNA guide strand loaded in RISC.^^ 
To assess if the observed sequencing results could be a 
consequence of an RNAi mode of action, we re-examined 
the sRNA-seq data set for TT-034 shRNA products that 
could account for the Sanger 5' RACE results. Such putative 
RNAi mediators were successfully identified for all the novel 
5' ends on the HCV replicon RNA genome (Table 3). Inter- 
estingly, the relative frequency of these sequences varied 
substantially both within and between biological replicates. 
IVIoreover, despite the presence of as many as six continuous 
mismatches in the 5' ends of these putative guide strands 
(e.g., shRNA22; Table 3), activity consistent with an RNAi 
mode of action appeared to be retained. 

Synthetic analogues of the TT-034-encoded shRNA 
exhibit comparable activity consistent with an RNAi 
mechanism of action 

To clarify further whether the TT-034-encoded shRNA were 
indeed acting via an RNAi mechanism of action, we evaluated 
the inhibitory potential and cleavage activity of three synthetic 
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Table 2 Resolution of the Sanger sequenced 5' RACE amplicons generated by TT-034, s-shRNA analogues of the TT-034-encoded shRNAs or synthetic siRNA 
representing the most abundant shRNA products yielded from TT-034 expression in Conib HCV replicon cells 



Con1b HCV replicon +RNA genome target region 



RNAi agent 


Sequence type 




sIlRNAS 






shRNA19 


shRNA22 


None 


Conib -^RNA'' 


5'- 


-...AAGGCCTTGTGGTACTg 


...-3' 


5'- 


-. . .AATTCCTGGCTAGGCAA. . .-3' 


5'-...CTCAAACTCACTCCAAT...-3' 


TT-034 


5' RACE output 


5'- 


■ . . . gwarcctTGTG GTACTG . . . - 


3' 


5'- 


■...kwrraatGGCTAGGCAA...-3' 


5'- . . . aarrrkwrraatCCAAT. . . -3' 




Resolution 1 


5'- 


■CCTTGTGGTACTG ... -3' 




5'- 


■TGGCTAGGCAA...-3' 


5'-TCCAAT...-3' 




Resolution 2 


5'- 


■TGTGGTACTG...-3' 




5'- 


■GGCTAGGCAA...-3' 


5'-CCAAT...-3' 


s-shRNA 


5' RACE output 


5'- 


■ . . . gaagcctTGTGGTACTG . . .- 


-3' 


5'- 


-. . .gwrgaaaGGCTAGGCAA. . .-3' 


5'-. . . rkwgraatCACTCCAAT. . .-3' 




Resolution 1 


5'- 


■CCTTGTGGTACTG. ..-3' 




5'- 


■TGGCTAGGCAA...-3' 


5'-TCACTCCAAT...-3' 




Resolution 2 


5'- 


■TGTGGTACTG...-3' 




5'- 


■GGCTAGGCAA...-3' 


5'-CACTCCAAT...-3' 


SiRNA 


5' RACE output 


5'- 


■ . . . gtagaaaTGTGGTACTG ... 


-3' 


5'- 


■...ggrgtagaaatAGGCAA...-3' 


5'- . . . rkwgaaatCACTCCAAT. . . -3' 




Resolution 1 


5'- 


■TGTGGTACTG...-3' 




5'- 


■TAGGCAA...-3' 


5'-TCACTCCAAT...-3' 




Resolution 2 












5'-CACTCCAAT...-3' 



"The TT-034-encoded sIRNA hybridization site is identified In capital letters. The 5' end of s-shRNA22 aligns to the last capital thymidine base in the Con1 b 
+RNA genome. 



Table 3 Frequency of sequences processed out of TT-034-encoded shRNAs detected by pair-end small RNA NGS that could account for novel 5' ends gener- 
ated on the Conib HCV replicon +RNA genome by an RNAi mechanism of action 

TT-034 hairpin-aligning reads (Frequency (range))* 



Sequence type shRNA6 shRNA19 shRNA22 



Conib -^RNA target" 


5'-...CGAAAGGCCTTGTGGTACTgcctgat...-3' 


5'-. . .CAATTCCTGGCTAGGCAAcat. . .-3' 


5'-. . .AAACTCACTCCAATcccggc. . .-3' 


Reads aligning to 








Cleavage site 1 


3'-...GGAACACCAT5' 


3'-...ACCGATCCGT-5' 


3'-...AGGTTAtgat-5' 




62.5 (47-78) % 


12.5 (7-17.7)% 


0.1 (0-0.24) % 


Cleavage site 2 


3'-...ACACCATGAg-5' 


3'-...CCGATCCGTT5' 


3'-...GGTTAtgatc-5' 




0.7 (0.3-1.1) % 


34.9 (10.8-58.9)% 


0.2 (0-0.4) % 



"Data represents the sum of TT-034-aligning sequence reads with a common 5' end. The first ten bases of the sequences are presented aligned to the 
corresponding hybridization site on the Conib +RNA target region. "Bases in bold represent the cleavage points detected by Sanger sequencing of 5' RACE 
products (Table 3). 



shRNA (s-shRNA) mimics of tlie liairpins encoded in TT-034 
(Table 1). All three s-shRNAs were found to be active against 
the HCV replicon (Table 1) and therefore RNA was extracted 
and subjected to Sanger 5' RACE analysis. Interestingly, 
s-shRNA6 and s-shRNA1 9 directed restriction of the HCV rep- 
licon RNA genome at the precisely the same positions as the 
TT-034-encoded shRNAs (Figure 3; Table 2), suggesting a 
common mechanism of action. However, shRNA22 resulted in 
two novel cleavage sites 4 bases upstream of those detected 
after treatment with TT-034 (Table 2). This apparent inconsis- 
tency was attributed to the different orientation design of the 
three hairpins (Figure lb): Maturation of shRNA22 was shown 
to yield putative guide strands with up to 6 nucleotides upstream 
of the shRNA22 hairpin encoded in TT-034 (Figure 2f): these 
upstream sequences were absent from s-shRNA22 (Table 1), 
eliminating putative guide strand production that could incorpo- 
rate these nucleotides. In contrast, the shRNA6 and shRNA19 
guide strand populations incorporated loop sequences, which 
were retained in the s-shRNA analogues. Notably the novel 
cleavage sites directed by s-shRNA22 were 10 and 9 nt from 
the 5' end of the hairpin, suggesting that at least one of these 
cleavage sites was generated by a process consistent with 
an RNAi mode of action. We interpreted the second 5' end as 
the result of 5'^3' degradation of the primary cleavage prod- 
uct. These data, therefore, further supported the precept that 
the cleavages induced by TT-034 on the HCV replicon RNA 
genome might indeed be on account of an RNAi mechanism, 
despite the imprecision of shRNA processing observed. 



Not all of the putative siRNA guide strands processed 
out of TT-034 can induce cleavage of the HCV replicon 
RNA 

To understand which of the putative siRNA guide strands 
processed out of TT-034 harbored RNAi activity, we carried 
out focused in vitro studies on synthetic siRNA analogues 
of shRNA22 maturation products. To select candidates for 
these investigations, we ranked the putative guide strands 
produced out of shRNA22 (Figure 2c) by incidence and 
selected the top 20 sequences (Table 4). Of these, 11 
(60%) exhibited >50% inhibitory activity in the HCV replicon 
reporter assay when transfected at 30 nmol/l concentrations, 
representing 4 out of 6 (66%) of the different 5' ends tested. 
Interestingly, variability at the 5' end through addition of off- 
target nucleotides resulted in progressive activity reduction 
(17-39.5%). On the other hand, removal of nucleotides from 
the 5' end had a more substantial effect (51 .3-88.2% activ- 
ity reduction). In contrast, 3' end variability did not appear to 
drive or impact robust inhibitory potential. Thus, the activity of 
the seven, most potent sequences with a common 5' end but 
different 3' ends varied by less than 6%. 

To validate that the observed replicon inhibition was on 
account of an RNAi mechanism of action, 5' RACE was per- 
formed following transfection with siRNA analogues of the 
most prominent maturation products of shRNA6, shRNA19 
and shRNA22 which exhibited comparable potency (Table 1). 
These studies identified only one predominant 5' RACE prod- 
uct for siRNA6 and siRNAI 9 consistent with an RNAi-induced 
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Table 4 Activity of siRNAs (30 nmol/l; n of 5-8) directed against the shRNA22 target site 



Guide 


Passenger 


Seed 


Inhibition (%) 


95% CI 


ATTGGAGTGAGTTTAAGCTG 


GCTTAAACTCACTCCAATTT 


TTGGAGT 


100 


75.2 to 118.0 


ATTGGAGTGAGTTTAAGGTGA 


AGCTTAAACTCACTCCAATTT 


TTGGAGT 


97.9 


74.8 to 115.6 


ATTGGAGTGAGTTTAAGCTGAA 


GAGCTTAAACTCACTCCAATTT 


TTGGAGT 


97.8 


71 to 105.8 


ATTGGAGTGAGTTTAAGCTGAAGCTT 


GCTTCAGCTTAAACTCACTCCAATAC 


TTGGAGT 


96.4 


93.7 to 99.0 


AAGCTTCAGCTTAAACTCACTCCAATAC 


ATTGGAGTGAGTTTAAGCTGAAGCTTGA 


TTGGAGT 


96.3 


91.6 to 97.9 


ATTGGAGTGAGTTTAAGCTGAAGCT 


CTTCAGCTTAAACTCACTCCAATAC 


TTGGAGT 


94.7 


99.0 to 98.9 


TATTGGAGTGAGTTTAAGCTGAAG 


TCAGCTTAAACTCACTCCAATACT 


ATTGGAG 


94.4 


68.1 to 97.9 


TATTGGAGTGAGTTTAAGCTGAAGCT 


CTTCAGCTTAAACTCACTCCAATACT 


ATTGGAG 


83.0 


66.2 to 99.6 


gTATTGGAGTGAGTTTAAGCTG» 


GCTTAAACTCACTCCAATACTA 


tATTGGA- 


82.9 


54.5 to 94.9 


gTATTGGAGTGAGTTTAAGCP 


CTTAAACTCACTCCAATACTA 


tATTGGA" 


74.7 


33.7 to 89.5 


agTATTGGAGTGAGTTTAAGCTG" 


GCTTAAACTCACTCCAATACTAG 


gtATTGG" 


61.6 


33.4 to 87.6 


TTGGAGTGAGTTTAAGCTGAAGCTTGAG 


CAAGCTTCAGCTTAAACTCACTCCAATA 


TGGAGTG 


60.5 


30.6 to 66.8 


agtATTGGAGTGAGTTTAAGCT" 


CTTAAACTCACTCCAATACTAG 


gtATTGG" 


48.7 


21.6 to 55.4 


TTGGAGTGAGTTTAAGCTGAAGCTTGA 


AAGCTTCAGCTTAAACTCACTCCAATA 


TGGAGTG 


38.5 


14.4 to 42.4 


TGGAGTGAGTTTAAGCTGAAGCTTGAG 


CAAGCTTCAGCTTAAACTCACTCCAAT 


GGAGTGA 


28.4 


-24.2 to 43.8 


TGGAGTGAGTTTAAGCTGAAGCTT 


GCTTCAGCTTAAACTCACTCCAAT 


GGAGTGA 


16.9 


-8.5 to 24.7 


TGGAGTGAGTTTAAGCTGAAGCT 


CTTCAGCTTAAACTCACTCCAAT 


GGAGTGA 


16.5 


-32.0 to 47.4 


agtATTGGAGTGAGTTTAAGC 


TTAAACTCACTCCAATACTAG 


gtATTGG" 


15.9 


-25.5 to 40.8 


TGGAGTGAGTTTAAGCTGAAGCTTG 


AGCTTCAGCTTAAACTCACTCCAAT 


GGAGTGA 


14.4 


-15.2 to 17.6 



"Small case nucleotides identify bases mismatched to the target RNA. 



mechanism of action;^' curiously, limited sequence ambigu- 
ity was also observed in the 5' RACE product of siRNA19 
(Supplementary Table S5). In contrast, siRNA22 resulted in 
two juxtaposed 5' RACE products identical to those obtained 
with the s-shRNA22 hairpin (Table 2). These results rein- 
forced our confidence that TT-034 activity was on account 
of RNAi. The true nature of the secondary 5' RACE prod- 
uct encountered with siRNA22 and s-shRNA22, though sus- 
pected to be a degradation product, remained unresolved. 

RACE-seq reveals siRNA-directed cleavage events can 
occur beyond position 10-11 

Perplexed over this apparent cleavage imprecision demon- 
strated by possibly two of the synthetic siRNAs tested by 
Sanger 5' RACE, we sought to study the outcome of their 
molecular action in more detail. We therefore designed 
new sets of gene-specific reverse transcription primers 
(Supplementary Table SI) that would yield short 5' RACE 
cDNA products, suitably sized for lllumina pair-end NGS 
(<80bp), or RACE-seq. Thus, we carried out independent 
studies on each of the three synthetic siRNA analogues of 
the TT-034-encoded shRNAs by transfecting the siRNAs 
separately into HCV replicon cells and subjecting total RNA 
extracts to RACE-seq. 

Taking into account the possibility of 5'->3' degradation of 
RNAi products, we expected the higher sensitivity of RACE- 
seq to yield a profile of progressive novel 5' ends starting from 
position 10-1 1 and heading downstream on the HCV genome 
target (i.e., -1 , -2, etc. towards the 5' end of the siRNA guide 
strand; Supplementary Figure SI). Accordingly, analysis 
of the novel 5' ends generated from siRNA6 (Figure 4a, b), 
siRNA19 (Figure 4c,d) and siRNA22 (Figure 4e,f) yielded 
evidence of cleavage occurring opposite position 10-11 with 
variable detection of shorter products, consistent with an 
RNAi mechanism of action. To our surprise, however, all siR- 
NAs yielded evidence of cleavage occurring beyond position 



10, i.e., at +1 , +2, etc. away from the accepted site of siRNA- 
directed, AG02-induced cleavage. Particularly, in the case of 
siRNA19 (Figure 4c), the profile obtained hinted at the pos- 
sibility that at least two main cleavage sites might be favored 
by RISC at positions 10-1 1 and 13-14 for this siRNA. Apart 
from these, the majority of other novel 5' ends encountered 
in the siRNA target region occurred at frequencies below 5%. 
Detection of these low frequency novel 5' ends appeared sen- 
sitive to both 5' RACE primer optimization (Supplementary 
Figure S2a) and sequencing depth (Supplementary Figure 
S2b), however their relative incidence remained largely unaf- 
fected. In stark contrast, neither of the RACE-Seq reactions 
performed on the no-transfection control and the mock trans- 
fection control yielded any reads within the targeted amplifica- 
tion regions. Furthermore, for each siRNA under investigation, 
no novel 5' end reads could be detected in the other two target 
regions, confirming that novel 5' end generation was specific 
to the target site in question and on account of siRNA-directed 
activity, rather than off-target effects or RNA degradation. Sim- 
ilarly, no novel 5' ends were detected in the antisense RNA 
replicon genome hybridization sites for siRNA19 (Figure 4d) 
and siRNA22 (Figure 4f), in accordance with previous data 
indicating activity of TT-034 on the antisense HCV replicon 
genome intermediate only at the hybridization site of siRNA6." 



Discussion 

Though considerable resources have been invested in 
understanding the biochemistry of RNAi,^^^**^" our efforts to 
develop a novel shRNA therapeutic necessitated full char- 
acterization of pro-drug (TT-034) maturation into an active 
drug form (siRNA guide strands). To date, NGS is the only 
methodology available to discriminate and relatively quan- 
tify single base pair differences in closely related sequences 
within a pure, but complex matrix of closely related nucleic 
acid materials. For this reason, we resolved to this approach 
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Figure 4 Detection of single siRNA-mediated cleavage points in the sense and antisense strands of the HCV replicon RNA genome by RACE- 
Seq. Following in Wfro transfection of HCV replicon cells with individual synthetic siRNAs, RACE-Seq was performed using gene-specific 
reverse transcription primers hybridizing immediately downstream of the target sites of (a,b) siRNA6, (c,d) siRNAI 9, or (e,f) siRNA22 on 
either (a,c,e) the sense or (b,d,f) the antisense RNA genomes of the HCV replicon. The incidence of novel 5' ends (y axis) on the HCV replicon 
genome witliin each siRNA hybridization site in a given orientation (x axis) is described. The orientation of potential 5'->3' RNA degradation 
is indicated using a right or left facing triangle. 



to characterize TT-034. Our results indicate that the core pro- 
cesses of shRNA maturation and siRNA nnode of action might 
be considerably more elaborate than previously thought. 

Dicer cleavage of shRNA precursors has been reported to 
occur at nucleotide positions 21 to 25 from the hairpin ends.^" 
Such results are typically obtained using northern blotting 
leading to the commonly held assumption that the variety 
of putative siRNAs processed out of an shRNA was equiva- 
lent to the handful of bands observed under electrophoresis. 
Moreover, as 5' RACE experiments suggested a single cleav- 
age site on target RNAs, it was commonly assumed that the 
guide strands of these few siRNAs shared the same 5' end 
but varied in their 3' end. Instead, by implementing NGS, we 
were able to show that an shRNA can be processed into up 
to nearly 100 different putative sIRNA strands. The clue to 
this apparent contention between the two techniques is not 
the length of the strands processed from shRNAs, but rather 



the dramatic variability in the sequences represented within 
each of these lengths (Supplementary Table S2). Thus, in 
their vast majority, the strands processed out of each shRNA 
might vary substantially in terms of 5' and 3' ends, but yield 
a strand population with sum strand lengths of -21 nt length 
(Figure 2). 

At first glance, these results could be dismissed as arte- 
facts. After all, while double stranded RNA is partially pro- 
tected from nuclease activity in cells, sample preparation 
for NGS provides ample opportunity for degradation.^" Yet 
comparative analysis of the reads aligning to either TT-034 or 
mlRNA precursors within the same sequencing run suggest 
otherwise. Thus, the distributions of the 5' ends of putative 
sIRNA strands was both wide and asymmetrical, as opposed 
to the consistent Gaussian distribution of 3' ends in strands 
with common 5' ends (Supplementary Table S2).This irreg- 
ularity suggested degradation might indeed be the source of 
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variability at tlie 3' end of putative siRNA strands, but not at 
the 5' end. In stark contrast, microRNA-aligning sequences 
demonstrated distinct 5' ends but variable 3' ends in agree- 
ment with previous reports, indicating lack of degradation 
amongst mlRNAs. Together, these data pointed towards both 
lack of sample degradation, and an intact Dicer mechanism 
in our model. 

Recently, a novel. Dicer-independent pathway of miRNA 
maturation has been elegantly demonstrated around miR-451 
biogenesis. Interestingly, the maturation of miR-451 is charac- 
terized by strong guide strand processing bias,^^"^° a feature 
also of shRNA22 maturation (Figure 2c,f).This could indicate 
that shRNA22 might also be processed in a Dicer-indepen- 
dent manner, that the passive strand of shRNA22 is unusu- 
ally unstable, or that the processing products are beyond the 
sensitivity of NGS. Although studies have been undertaken 
to assess if a miR-451 scaffold, which is crucial for Dicer- 
independent processing, can be used for shRNA design 
purposes^" there is only partial evidence that such constructs 
are matured in a Dicer-independent manner.^^ More impor- 
tantly, however, the decisive feature that drives miR-451 matu- 
ration independently of Dicer is the requirement for a short, 
17 nt long hairpin stem, which is absent in the TT-034 shR- 
NAs.^^ The same, short stem feature was also corroborated 
recently in shRNA maturation engineering studies, which 
implicated AG02 in Dicer-independent <19bp stem shRNA 
(or AgoshRNA) maturation. Elsewhere, the re-examination 
of a dataset from a single in vitro Dicer knockdown study 
has raised the prospect that more miRNAs as well as other 
small noncoding RNAs might undergo Dicer-independent 
maturation. Thus, despite the limited structural similarity of 
shRNA22 to the primary transcript of miR-451 or published 
AgoshRNAs, a Dicer-independent pathway might be involved 
in the processing of this hairpin. The increased insight of strand 
processing afforded by sRNA-seq compared to northern blot- 
ting studies might serve to further clarify shRNA tailoring for 
strand selection and processing mechanism access control. 
Nevertheless, given that irrespective of maturation pathway 
all TT-034-encoded hairpins are processed into tens of differ- 
ent potential guide strands, we sought to understand how this 
might affect the RNAi potential of the ensuing siRNAs. 

To address this conundrum, we resorted to classical 5' 
RACE experiments. We reasoned that such an increased 
number of putative siRNAs would result in an accordingly 
higher number of cleavage points on the targeted RNA. Con- 
trary to our expectations, electrophoretic analysis suggested 
that only a single sIRNA-directed restriction event was 
mediated per shRNA, yet Sanger sequencing consistently 
resolved this band into at least two novel 5' ends per target 
site. These results were reproduced both between biologi- 
cal replicates and with s-shRNA analogues, even where the 
s-shRNA design forced cleavage upstream from its' vector- 
encoded counterpart. Importantly, putative guide strands that 
could be responsible for the generation of these novel 5' ends 
on the HCV replicon were identified amongst the sRNA-seq 
data for all hairpins. Hence, these results raised the possibil- 
ity that our observations were not the outcome of degradation 
of the target RNA in a 5'^3' direction, but the consequence 
of more than one putative sIRNA guide strand being active 
through an RNAi mechanism. 



Perplexingly, these findings were reproduced even with 
high purity synthetic siRNA analogues. Closer inspection 
of the capillary sequencing trace indicated that a third 5' 
RACE product might indeed be present, but at levels below 
those acceptable in Sanger population sequencing analysis. 
We therefore concluded that a more sensitive method was 
required. Thus, we engaged in developing RACE-seq, rea- 
soning that at sufficient depth the technique would be able 
to resolve all novel 5' ends generated on an RNA target after 
exposure to RNAi mediators. To maximize fidelity, we opted for 
pair-end sequencing, and took care to generate cDNA com- 
patible with the read length limitations of this approach (i.e., 
<80bp).^'' Thus, the need for mechanic/enzymatic shearing 
of long cDNA that could give rise to spurious 5' ends or read 
elimination on account of sequencing errors due to cDNA 
length was avoided. Our control reactions confirmed the 
specificity of our approach, and the validity of the method was 
further demonstrated in the replication of previous findings on 
preferential strand activity (Figure 4d,f) observed in reporter 
construct and qPCR studies. In stark contrast, an excess 
of 10** reads corresponding to a primary novel 5' end at the 
canonical restriction site directed by siRNA22 were detected 
(Suppiementary Figure S2). In addition, substantial reads 
corresponding to additional novel 5' ends juxtaposed to the 
primary cleavage site in the 5'^3' orientation were detected 
for all siRNAs tested. The frequency of these secondary and 
tertiary reads was incrementally lower per nucleotide step 
(-1, -2, etc) in a Gaussian fashion, alluding to 5'->3' RNA 
exonuclease or endonuclease activity (degradation). Intrigu- 
ingly, novel 5' ends in the HCV replicon genome were also 
detected at position +1 upstream of the expected cleavage 
point for siRNA6 (Figure 4a) and siRNA22 (Figure 4e) at 
apparently negligible frequencies (0.2 and 1 .3%). Yet the most 
unexpected finding was obtained with siRNA19 (Figure 4c). 
In this case a cumulative 25% frequency of novel 5' ends at 
positions +1 to +5 were encountered, with position +3 rep- 
resenting >14% of reads- an amount notable both at loga- 
rithmic and linear data representation scales. These results 
indicated that siRNA19 might direct on-target RNAi activity 
up to 5 bases beyond position 10-1 1 from the 5' end of the 
guide strand, with position 13-14 being a secondary cleav- 
age point. 

One possible explanation for this result could be that the 
guide strand of siRNA19 carries two mismatches to the HCV 
replicon genome at its 5' end. However, this contrasts the 
+3 position of the secondary cleavage point and thus might 
not account for our observations, even after cytosolic 5' end 
processing of siRNAs.^'^ Alternatively, the presence of 5' mis- 
matches to the target sequence might alter the "molecular 
ruler" properties of siRNA in AG02 to define a single cut site. 
After all, the sense strand of siRNA6, which is active against 
the antisense RNA strand of the HCV replicon genome 
(Figure 4b and ref. ^^), does not contain any 5' end mis- 
matches and does not display any inexplicable cleavages. On 
the other hand, the active strand of siRNA22 also lacks 5' 
mismatches to its' target yet still was found to result in target 
cleavage beyond the expected position (Figure 4e).Thus, our 
results indicate that AG02 slicer activity might not be as pre- 
cise as previously proposed,^^ and that this precision might 
be driven, at least in part, by a siRNA sequence component. 
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siRNA6 and especially siRNA22 also exhibited the capacity 
to generate novel 5' ends well beyond position 10-1 1 , proxi- 
mal to the siRNA guide strand 3' end. Importantly, while the 
detection and extent of these additional products appeared 
to be relative to RACE-seq depth (Supplementary Figure 
S2c,d), they were found at levels 10^-10^ lower than the pri- 
mary product. One possible explanation for these cleavages 
could involve RNase H, a double stranded nucleic acid endo- 
nuclease typically exploited in antisense approaches.^** How- 
ever, such a mechanism is unlikely: RNAse H loading would 
lead to the destruction of both si RNA strands. IVIoreover, no 
cleavages were detected in the antisense RNA intermedi- 
ate of the HCV replicon genome at the hybridization sites 
of siRNA19 and siRNA22. Another possibility would involve 
AG01 loading of the siRNAs. This would direct sequestration 
and eventual degradation of the replicon genome in p-bodies, 
in a microRNA-like mechanism currently hypothesized to be 
a low turnover process.^'' The scarcity of these events in our 
data is in support of this tenet. RACE-seq might therefore 
be a useful tool in experimental validation of computationally 
predicted mlRNA recognition elements. Such an approach 
would be complementary to Cross-Linking Immuno-Precip- 
itation NGS (CLIP-seq),^** which provides a more physiologi- 
cally relevant solution to overexpression reporter constructs,^" 
but gives only a global overview of potential mRNA::miRNA 
interactions. 

One of the main risks of RNAi, especially in a therapeu- 
tics context, is its strong potential for off-target activity. Does 
the unexpected plethora of putative sIRNA guide strands 
produced from shRNA precursors present such a risk? Our 
data indicate that strand prevalence may not necessarily 
drive potency. Thus, putative shRNA22 guide strands with 
the intended 5' end were the most commonly observed ones 
according to sRNA-seq (Figure 2). Analogues of these were 
also shown capable of inducing potent AG02-mediated RNAi 
according to both reporter and 5' RACE assays (Tables 1 , 2, 
and 4). Still, they were not the primary drivers of shRNA22 
activity according to 5' RACE (Table 2). Collectively, these 
results implicate preferential guide strand loading onto RISC 
as the crucial step in enabling potency. Dicer is known to play 
a fundamental role in siRNA selection prior to activation of 
the RISC complex.'""'^ As Dicer is also involved in shRNA 
maturation, it is conceivable that only a fraction of the puta- 
tive guide strands it produces exhibit the appropriate charac- 
teristics for RISC loading. These properties are still unclear 
although they appear sequence-dependent,"^ as our focused 
studies on shRNA22 maturation products indicate (Table 
4). Further investigations on the fundamental mechanisms 
of putative shRNA strand loading onto AGO 1 and 2 are 
likely to require techniques capable of identifying sequences 
associated with the RISC complex, such as co-immunopre- 
cipitation.'*^'"' With respect to the clinical potential of TT-034, 
bioinformatic analysis of the putative guide strands gener- 
ated by the three shRNAs indicates that these do not align 
to either the human or mouse transcriptome suggesting that 
off-target effects by these sequences, even if RISC-loaded, 
should be limited. 

Accordingly, TT-034 has been repeatedly found to be 
nontoxic in vitro, in rodents and in nonhuman primates 
even at high doses ensuring comprehensive hepatocyte 



transduction." Moreover, our replicon efficacy studies have 
demonstrated strong antiviral activity,^^ which, as presented 
herein, is in accordance with a siRNA mechanism of action. 
This efficacy is sustained despite the existence of single mis- 
matches in the replicon genome in two of the three TT-034 
target sites (Supplementary Table S7), or up to four 5' 
end mismatches between the most active shRNA22 guide 
strands and the replicon genome. Mismatch tolerance was 
also observed with both the s-shRNA and siRNA analogues 
(Tables 1 and 4) within an AG02-mediated RNAi context 
(Table 2). Our data therefore suggest that TT-034 might have 
a wide spectrum of activity not limited to HCV genomes car- 
rying at least one fully homologous target sequence. 

The capacity to tolerate mismatches is important for an 
antiviral RNAi-based agent, particularly against a pathogen 
exquisitely equipped for resistance development through 
mutational escape. This is a well-described problem in HCV 
therapy on account of it's error-prone RNA-dependent RNA 
polymerase."^ Herein, perhaps, might lie the biggest advan- 
tage of TT-034, in its capacity to generate a diverse panel 
of putative guide strands. Thus, mutations arising within the 
shRNA-targeted region might be tolerated by TT-034, allow- 
ing it to continue to suppress the virus effectively. Our in vitro 
observations so far suggest this might be the case,^^ espe- 
cially if one takes into account the mismatches in the replicon 
system and the inherent positive selection bias towards viral 
escape for cell propagation under G418 antibiotic selection. 
Nevertheless, our view is that the genomic adaptation of HCV 
required for maintenance in culture restricts extrapolation to 
the patient milieu. Similarly, we would caution direct compari- 
sons to observations with other viral targets of entirely differ- 
ent nature such as the host genome-integrating HIV retrovirus. 

Interestingly, RNAi susceptibility studies on HCV using 
reporter constructs have indicated that the (-) strand RNA 
intermediate of the HCV genome may not be accessible to 
RNAi pathways. ""^ Our RACE-Seq results, however, sug- 
gest otherwise for TT-034 at least for shRNA6, introducing 
additional therapeutic advantages by extending the number 
of loci targeted to four, and involving both versions of the 
HCV genome. On the other hand, the discrepancy between 
these two studies raises important questions on the extent 
to which explicit 5'-RACE or RACE-seq data confirming an 
RNAi mode of action might be necessary in drawing robust 
conclusions out of RNAi experiments. 

In conclusion, in this work, we have applied NGS to scru- 
tinize the biological processing and mechanism of action 
of the novel RNAi agent TT-034. To our knowledge, this is 
the first study engaging into such a detailed survey of these 
core mechanisms of therapeutic RNAi mediators. Our stud- 
ies have revealed unexpected complexity in both shRNA 
maturation and siRNA activity supporting further studies on 
the basic processes governing Dicer-dependent RNAi agent 
processing and siRISC bioactivity. However, our results have 
also demonstrated an on-target mechanism of action for 
TT-034 against both the (-i-) and (-) strands of the HCV RNA 
genome. Coupled to its established safety in rodents and 
nonhuman primates" and its apparent capacity to suppress 
HCV from resistance development,^^ these findings justify 
progression of TT-034 to the clinic to investigate its potential 
benefit for the treatment of chronically infected HCV patients. 
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Materials and methods 

Cells. Con1b replicon cells (licensed from RebLikon GmbiH, 
Sciiriesiieim, Germany) refer to a Huii-7-derived ceil line 
supporting tiie replication of subgenomic repiicon RNA from 
the Gon1 strain of HGV genotype 1b. The replicon encodes 
the neomycin phosphotransferase gene for selection as well 
as the firefly luciferase reporter gene to monitor its expres- 
sion in cells. Gon1b replicon cells were grown in Dulbecco's 
modified Eagle's medium (Gibco, Paisley, UK) supplemented 
with 10% fetal calf serum, 1 mmol/l sodium pyruvate, 1x non- 
essential amino acids, 1x penicillin/streptomycin and 500 ^jg/ 
ml G418. 

TT-034 and synthetic RNAs. TT-034 was produced by 
triple transfection of HEK293 cells using an AAV helper 
virus system (Stratagene, Stockport, UK). Viruses were 
purified using cesium chloride gradient centrifugation 
as previously described"'' and the titre was determined 
by quantitative real time PGR [Forward primer: 5'-AGGT 
GGAGGGTGAGTGGTTTTT-3', reverse primer 5'-GAAGGATG 
ATGAGAAGGGTTGA-3' and probe: S'-FAIVI- TTGTTGAAGG 
GGGGAAGGTGGTG-TAMRA-3' (Applied Biosystems)]. The 
titer was expressed in vector genomes/millilitre (vg/ml) and 
the dose used in term of vector genomes / Huh7 cells (vg/ 
cells). s-shRNA and siRNA (Table 1) were manufactured 
by Integrated DNA Technologies and purified with PAGE 
and ion-exchange HPLG. Synthetic RNA purity (>99%) was 
confirmed internally by mass spectrometry as previously 
described.^'' 

TT-034, s-shRNA and sIRNA activity in Con1b replicon cells. 
Gonib replicon cells were transduced with TT-034 by incu- 
bating the appropriate amount of vector with the cells in a 
2 ml volume. After 4 hours, an additional 8 ml of complete 
medium were added and the cells were incubated further for 
68 hours until harvested. s-shRNA and siRNA were reverse- 
transfected into cells using Dharmafect3 (Dharmacon, 
Epsom, UK) as summarized in Table 1. To ensure consis- 
tent nucleic acid concentration during transfection, up to 1 
[imo\/\ of nonspecific oligonucleotide [5'-GAGGAGTTGGGA 
GGGATG-3'] was added. Upon transduction or transfection, 
the Gonib replicon cells were incubated for 48 hours then 
harvested using 0.25% Trypsin-EDTA (Gibco). An aliquot 
was lysed to measure the luciferase activity and cytotoxic- 
ity. Luciferase activity was measured using the BriteLite 
assay system (Perkin Elmer, Gambridge, UK) as previously 
described. Gytotoxicity measurements were conducted 
using GellTiter-Glo according to manufacturer's instruction 
(Promega, Southampton, UK). 

NGS of shRNA and miRNA maturation and data analysis. 
Gonib replicon cells were transduced with TT-034 at a dose 
of 30,000 vg/cell. After 3 days incubation, the cells were 
harvested and RNA was extracted using Trizol. Small RNA 
libraries were constructed (DNAVision, Gharleroi, Belgium) 
using the Small RNA Sample Prep kit (lllumina, San Diego, 
GA) according to the manufacturer's instructions. Briefly, 
small RNA was gel purified from 6 \Jig of total RNA, 5' and 
3' adaptors were ligated, followed by reverse transcription 



and pair-end library enrichment by amplification (lllumina). 
The cDNA library was purified and validated for size, quality 
and concentration using an Agilent Bioanalyzer 2100 (Agi- 
lent Technologies, Santa Glara, GA). Each small RNA cDNA 
library was denatured, hybridized to an 8-lane flow cell, fol- 
lowed by cluster generation using isothermal amplification 
with a DGE-Small RNA Gluster Generation Kit v1 .0 (lllumina). 
The prepared flowcell was sequenced for 75 cycles using a 
Genome Analyzer llx (lllumina), according to the manufac- 
turer's instructions. 

In order to obtain interpretable data, only sequences with 
full-length adapters at both termini were included in down- 
stream analyses. Low quality 3' tails, 3' polyA tail (a common 
artefact of the lllumina sequencing technology)'"' and adapter 
sequences were trimmed from the reads. Moreover, we fil- 
tered out sequencing reads of less than 15 nucleotides in 
order to eliminate siRISG-restricted sIRNA passenger strand 
fragments. Given that both the forward and reverse reads 
covered the entirety of the mature species sequenced, reads 
with any mismatches between the two were discarded to 
yield a high-confidence data set. To identify shRNA-derived 
sequences, assembled species were aligned to the TT-034 
reference sequence using Bowtie vO.12.5.''^ To investigate 
microRNA (miRNA) diversity, sequencing data was similarly 
aligned to the 1527 miRNA hairpin sequences in mirBase 
V. 18.^^ Only species with perfect match to TT-034 or pre- 
mlRNA hairpins were included in the downstream analysis. 
The results presented are representative of the analysis of 
two independent samples. 

Sanger 5" RACE and NGS-adapted 5' RACE (RACE-seq) 
analysis. Gonib replicon cells were transduced with TT-034 
or reverse-transfected with synthetic siRNAs or shRNAs at 
the concentrations indicated in Table 1. After 48 hours and 
following luminometric confirmation of antiviral activity and 
absence of cytotoxicity, RNA was extracted using the Qiagen 
RNAminiprep kit (Qiagen, Grawley, UK). For Sanger sequenc- 
ing, 5' RAGE analysis was performed on 2 |jg of total RNA 
using the GeneRacer kit (Invitrogen, Garlsbad, GA) accord- 
ing to the manufacturer's instructions with the exception that 
the RNA were directly ligated to the kit 5' adapter. cDNA gen- 
eration and RAGE amplification were performed using the kit 
adapter and site-specific primers listed in Supplementary 
Table S1. Following amplification, each RAGE mixture was 
loaded and resolved on a 1.6% agarose gel. Fragments of 
the approximate expected size were excised from the gel, 
extracted using the Qiaquick Extraction kit (Qiagen) and 
directly submitted for Sanger sequencing with 20% resolu- 
tion per base (Lark Technologies, Takeley, UK). Sequence 
analysis was performed using FinchTV v1 .4.0 (PerkinElmer). 

For RAGE-seq, a truncated GeneRacer 5' RAGE adapter 
(5'-GGAGAGUGAGAUGGAGUGAAGGAGUAGAAA-3') was 
used in a 5' RAGE reaction with a specific set of primers suit- 
able for high fidelity pair-end sequencing (Supplementary 
Table S1). Those primers were designed to produce ampli- 
cons of less than 80 bp with no overlap with the sIRNA hybrid- 
ization site, thereby avoiding accidental siRNA sequencing. 
Primers were designed for the (-i-) RNA and (-) RNA orienta- 
tion of the HGV genome, as both might be targeted by TT- 
034-encoded shRNA. Total RNA 5'-end ligation and cDNA 



Molecular Therapy — Nucleic Acids 



SEQrng RNA Interference 

Deniseetal. 



11 



generation was performed using the GeneRacer l<it accord- 
ing to tlie manufacturer's instructions. A touchdown qPCR 
protocol was used to improve amplicon specificity. Briefly, 
GeneRacer l<it PGR reagents were supplemented with primer 
to the truncated 5'-RNA adapter and gene-specific NGS PGR 
primers at 900 nmol/l concentrations each: 95 °G denatur- 
ation (15 seconds) was followed by annealing-extension at 
70-60 °G over 20 cycles (0.5 °G steps, 1 5 seconds), followed 
by a further 20 cycles of amplicon re-amplification at 60 °G 
(15 seconds) and a 72 °G final extension step for 1 minute. 
PGR products were submitted to DNAVision (Gharleroi, Bel- 
gium) for library construction and pair-end NGS as described 
above. Resulting data were quality processed as indicated 
above, the adapter sequences were removed and sequences 
with >10bp length and no mismatches to the Gon lb genome, 
as determined by Bowtie, were selected. Due to the high 
incidence of complementary sequences in the 5' end of the 
HGV genome, the alignments were forced according to the 
orientation of the HGV genome targeted by the 5' RAGE 
primer under investigation ((-i-) RNA or (-) RNA). Results 
were expressed as the incidence of novel 5' ends generated 
on the target sequence within the siRNA hybridization site in 
a logarithmic scale on account of Gaussian distribution pro- 
files at logarithmic versus linear scales (e.g., Figure 4 versus 
Supplementary Figure S2). 

Supplementary material 

Figure S1. Predicted profile of RAGE-Seq products gener- 
ated by siRNA experiments. 

Figure S2. Impact of 5' RAGE reaction optimization and NGS 
depth on detection sensitivity of novel 5' ends by RAGE-Seq. 
Table S1. Primer sequences for 5' RAGE and RAGE-Seq 
on the (A) shRNA6, (B) shRNA19, and (G) shRNA22 target 
sites. 

Table S2. Diversity of ssRNA strands processed out of TT- 
034-encoded shRNA hairpins expressed in TT-034-treated 
Goni b HGV replicon cells as determined by lllumina pair-end 
small RNA sequencing. 

Table S3. Diversity of mature mlRNA sequences in TT- 
034-transduced, Gonib replicon Huh7 cells as determined 
by lllumina pair-end small RNA sequencing. 
Table S4. Number and sequence of Gonib HGV replicon 
aligning reads obtained by paired-end small RNA NGS fol- 
lowing transduction of Gonib HGV replicon cells with TT- 
034. 

Table S5. 5'-RAGE outputs of TT-034 target regions fol- 
lowing treatment of Gonib HGV replicon cells with TT-034, 
synthetic shRNA analogues (s-shRNA) or siRNA analogues 
of TT-034-encoded shRNA6, shRNA19, and shRNA22 as 
compared to the HGV replicon target sequence (cleavage 
site identified by "*"). 

Table S6. Full 5'-RAGE outputs of TT-034 target regions fol- 
lowing treatment of Gonib HGV replicon cells with TT-034 
(ambiguous 44 nt region in red). 

Table S7. TT-034 shRNA homology to Gonib strain (Gen- 
bank AJ238799). 
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